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Abstract 

Pattern recognition involves extracting features, transforming/normalizing them, and them performing classification 
in order to assign new or predefined categories to the entities of interest. In this work, we focus on the transformation 
and normalization of features. The concept of morphing a set of points and visualizations of respective displacement 
fields are introduced and applied in order to visualize and better understand the possible effects of feature transformation 
and normalization. When applied with care, these operations have the potential to enhance the classification stage. 
However, we illustrate that features transformations and normalizations can also create false clusters as well as merge 
existing clusters, so that special attention is required when performing these operations. Some important features 
transformations and normalizations, including the standardization procedure as well as principal component analysis 
and linear discriminant analysis, are also presented and briefly discussed. 


“L’occhio non vede cose ma figure di cose che significano altre 
cose.” 

Italo Calvino. 


1 Introduction 

Pattern recognition (e.g. DM). is much more general 
and widespread than often realized, as it underlies most 
of human (as well as other living beings) intelligence and 
activities. As a consequence, pattern recognition has an 
ample range of applications, extending from quality con¬ 
trol to text and image interpretation. 

Figure [l] depicts the main data and operations involved 
in pattern recognition. First, the patterns (or entities) 
to be recognized have to be somehow generated (e.g. 0) 
with basis on respective parameters. Then, a selected set 
of features are extracted from the patterns so as to pro¬ 
vide a quantitative characterization of the entities to be 
recognized (e-g. 0 ). This step is particularly challeng¬ 
ing, as the selection of features is not straightforward and 
depends on previous experience with the data, measure¬ 
ments, and classification methods (e.g. [3]). The extracted 
features can then be transformed or normalized with the 


objective of deriving new features capable of improving 
the entities characterization. Features transformations 
can take each feature independently or combined with 
other features. For instance, transformations can be used 
to remove noise from features, to make them smoother, 
to obtain more discriminative measurements, to reduce 
the dimensionality of the feature space, etc. Normaliza¬ 
tions are often required in order to remove translation, 
rotation or scaling effects from the features, providing a 
more standardized set of features. The final step in our 
diagram consists of classifying , i.e. assigning previously 
defined or new categories, to the entities with based on 
their transformed/normalized features. 

The present work focuses on the important stage of 
features transformation and normalization. These two 
tasks, which can have important effects (wanted and un¬ 
wanted) in pattern recognition, are indeed unavoidable 
as the very implementation of any measurement implies 
some choice related to transformation and normalization. 
For instance, in case we are measuring the weight of fruits 
to be considered as respective features, we are immedi¬ 
ately faced with the question of which unit to adopt, such 
as grams, ounces, etc. In addition, the very definition of 
features often involved transformations combining other 
features, such as the ratio between the standard deviation 
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and mean of a given measurement (known as variation co¬ 
efficient). 

We will start by introducing the concept of morphing 
a set of points, which will be subsequently applied to il¬ 
lustrate the concept and effects of feature transformation 
and normalization. Indeed, most feature transformations 
and normalizations can be conceptualized in terms of mor¬ 
phing and respective vector fields, which help to visualize 
and understand the respective effects, such as creating 
false clusters that did not originally exist and merging 
clusters when they should be separated. The provided 
examples motivate the need of special care and attention 
when transforming and normalizing features. 

Next, we will discuss independent feature transforma¬ 
tion, in which each feature is transformed into a respective 
new feature without considering other features. Transfor¬ 
mations involving the combination of features are covered 
subsequently. The specially important normalizations in¬ 
volving the minimum/maximum values of the features, as 
well as the standardization procedure, are then presented 
and illustrated. The interesting feature normalizations 
known as Principal Component Analysis (PCA), which is 
an unsupervised methodology, as well as Linear Discrim¬ 
inant Analysis (LDA), a supervised method, are then in¬ 
troduced and discussed. Both the latter normalizations 
involve linear combination of the original features. 

2 Morphing a Set of Points 

The term morphing can be understood as changing the 
shape of a given geometric structure, such as the mirrors 
in amusement parks. In this section we present the con¬ 
cept of morphing a set of N discrete points in 5ft 2 , each of 
them represented in terms of the respective coordinates 
(x,y). The extension to higher dimensions is straight¬ 
forward, though respective visualization of the morphing 
operation becomes more challenging. 

Each given point (x, y) is transformed into a new point 
(x, y) by the morphing operation. We will consider 
that the morphing is implemented through two respec¬ 
tive scalar fields S x and S y which are both functions of x 
and y, i.e. 

X = S x (x, y) 

y = S v (x,y) (1) 

The above concept can be directly extended to points 
defined by higher dimensional feature spaces, as is typi¬ 
cally the case in pattern recognition: 

4=%(/i,/2,...,/m) (2) 


for i = 1,2,...,M features. For simplicity’s sake, 
we will illustrate the morphing approach with respect to 
points [x, y] in 5ft 2 . 

In other words, the morphing operation moves each 
original point to a new position in the feature space. 
The morphing operation can be more conveniently visu¬ 
alized by decomposing it in terms of the original posi¬ 
tion of the point, i.e. [x,y\, and respective displacement 
[D x (x,y),D y (x,y)}: 

x = x + D x (x,y) 

y = y + D v {x,y) (3) 

This can be understood as transforming the action of 
the scalar fields S x (x,y ) and S y (x,y) into the effect of a 
respective vector field [D x (x, y), D y (x : y)}. 

Figure [2] illustrates this decomposition. 

It follows from Equations [T| and [4] that: 

D x (x,y) = S x (x,y) - x 

D y (x,y) = S y (x,y) -y (4) 

Interestingly, the displacement vector 

[D x (x, 2 /), D y (x, y)\ provides an interesting concep¬ 
tual visualization of the transformation effect, which 
will be used in this work in order to illustrate the effect 
of several feature transformation/normalization opera¬ 
tions. The following seciton provides some interesting 
examples of how the application of these operations 
can substantially change the cluster structure in feature 
spaces. 

3 Features Transformations 

Let the points [x, y\ be transformed by the following dis¬ 
placement field: 

x = D x (x, y) = -x 2 sgn(x) 
y = Dy(x, y ) = -y 2 sgn(y ) (5) 

Figure [3] depicts the action of the above point transfor¬ 
mation on a circularly bound set of uniformly distributed 
points. As summarized by the visualization of the applied 
displacement field, the original points moved toward the 
center of the coordinate system. This happened more 
noticeably with the points near the circular border, as 
the magnitude of the applied field is stronger at those 
positions, see Fig. §b). As a consequence, the density 
near the the border of the obtained point distribution is 
found to be higher than that near the origin of the co¬ 
ordinate system, which had its density nearly preserved. 
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Figure 1: The several stages involved in the endeavor of pattern recognition. First, patterns are generated according to respective parameter 
configurations. Then, features are extracted from those patterns in order to provide respective quantitative representations. Though an 
one-dimensional feature space fi is shown in this diagram for simplicity’s sake, higher dimensional spaces are typically involved in pattern 
recognition. These features can subsequently be transformed and/or normalized (red arrows), which can involve combinations of the previous 
features, yielding a new feature space f\. The transformed/ normalized features are then fed to a classification method, which will then 
hopefully provide the respective correct categories. The present work focuses on the feature transformation and normalization stage. 
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Figure 2: The morphing from point [x, y] (blue vector) into the point 
[, x , y\ (orange vector) can be decomposed in terms of the vector sum 
of the original point position and the respective displacement vector 
[D x (x,y), D y (x,y)] (green vector). 


The own border of the region where the points were orig¬ 
inally contained changed shape as a consequence of the 
transformation. 

This first example corroborates the fact that applying 
transformations to features can have substantial effects 
on the obtained density and distribution of points. 

Figure [4] illustrates another important effect that can 
be observed when applying feature transformations. In 
this case, the original points were subjected to a displace¬ 
ment field that forced convergence at two distinct centers, 
namely [—3, —3] and [3, 3]. Each of these centers were im¬ 
plied by a respective gaussian distribution of displacement 
magnitudes. 

This examples illustrates the situation in which false 


clusters can arise as a consequence of feature transforma¬ 
tions. 

Another effect to be avoided is shown in Figure [5j in 
which a feature space containing two well-defined clus¬ 
ters has been mapped into a single cluster by applying 
a displacement field - Fig. [5jc) - that does not depend 
on y and that vectors with magnitudes defined by a one¬ 
dimensional gaussian centered at the origin of the coordi¬ 
nate system. 

This situation illustrates that feature transforma¬ 
tion/normalization can impact the separation and shape 
of clusters. 

Despite the possible problems illustrated above, fea¬ 
tures transformations and normalizations are often very 
useful when applied with due care and attention, as 
they can help to emphasize clusters, control noise and 
bias/distortions in the original data, as well as to reduce 
the dimension of the feature space. 

The remainder of this work presents some of feature 
transformations and normalizations often adopted in pat¬ 
tern recognition, but before that we characterize two main 
groups of transformations: those depending only on each 
feature (independent), and those involving combination of 
features. The general forms of these two types of trans¬ 
formations are given in Equations [6] and [7| respectively. 
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Figure 3: Action of the transformation defined by Eq. [ 5 ] The initial set of uniformly distributed points (a) is transformed through 
the displacement field in (b) into a new set of points (c) that exhibits enhanced density at the respective borders. Observe that this 
transformation also acts in changing the initially circular shape of points distribution border. The magnitude of the displacement field has 
been shown to a fraction (0.5) of its original value in order not to clutter the visualization. 
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Figure 4: Feature transformations, as illustrated here, can inadvertently create clusters that were not present in the uniformly distributed 
original data. Each of the concentration centers were implied by respective radial displacement fields (pointing toward the respective center) 
with gaussian magnitudes. 
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For instance, the transformation in Equation [5] is an 
independent transformation. Observe also that the trans¬ 
formation scalar fields, e.g. S x and S y , will necessarily be 
of the same type as the respectively associated displace¬ 
ment vector fields, e.g. D x and D y . 

4 MinMax Normalization 

Given a feature (or measurement) fi varying in the inter¬ 
val [fi^mini fi,max\ , a new respective version of this feature 


fi varying in the interval [0,1] can be obtained by apply¬ 
ing the following independent feature transformation: 

n fi fi,min / Q \ 

fi — ~p "“7 \P) 

Ji,max Ji,min 

For simplicity’s sake, this normalization transformation 
will be henceforth called minmax normalization in the 
present work. 

Figure [6] illustrates the minmax feature normalization 
with respect to two clusters of uniformly distributed 
points in (a). The respective displacement, shown in (b), 
implies a vertical expansion of the clusters, resulting in 
the elongated clusters in (c). This new shape of the two 
clusters can have impact in the subsequent classification 
stage. 
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Figure 5: Two well-separated clusters can be merged as a consequence of certain feature transformations such as that applied in this example. 


(a) 


<b) 


<e) 



J TTTT T TTTTTT 

* J I l \ i K V V V V 
k k A UVVVVVV 

-H i v v t r r ts** 
m i f r r 11 r t r 
■muff//// 

llll.ll imj 


-2 



Figure 6: The minmax normalization of two well-separated clusters of uniformly distributed points (a) . Elongated clusters are obtained (c) 
as a consequence of the vertical expansion implied by the respective displacement field (b). 


5 Unit Transformations 

We have already briefly discussed in the introduction of 
this work that the units in which the features are taken 
can influence the respective classification, eventually re¬ 
quiring transformation and normalization. Given a mea- 
surement /» varying in the interval [fi, m in, fi,max\, it is 
possible to linearly transform it into another measure- 
ment f :j varying in the interval [f : j. mt n- ./y max ] by applying 
the following equation: 

f—(f ■ — f ■ ■ ) _ — __l f. . fq) 

Ji,max Ji,min 

This transformation can be understood as a minmax 
normalization, yielding the new variation interval [0,1], 
followed by a scaling product by ( fj,max ~ fj,min ) and 
a translating subtraction by fj^min- Observe that this 
transformation assumes a linear relationship between the 


two features of interest. 

For instance, the conversion from Celsius (C) to Fahren¬ 
heit (F) can be obtained by considering respective inter¬ 
vals [0,100] and [32, 212] as: 

(7 — 0 

F = (212 - 32)^—- + 32 = 1.8(7 + 32 (10) 

Similarly, the conversion from Fahrenheit to Celsius can 
be immediately obtained as: 

C = (100 -°)2l2^l +0 =IM (F - 32) (U) 

6 Standardization 

Standardization of a measurement fi involves subtract¬ 
ing its mean /if. followed by a division by its respective 
standard deviation cr/., i.e.: 


5 













fi = - — (12) 

°7i 

This statistical transformation of a feature yields a new 
feature fi that is dimensionless and that has zero means 
and unit standard deviation (e.g. 0). In addition, a great 
deal of the instances of the new measurement fi will tend 
to fall within the interval [— 2 , 2 ]. 

The standardization of a feature fi can be understood 
as moving the center of the respective distribution to zero 
(a translation in the feature space), accompanied by a 
scale normalization in which the dispersion of the mea¬ 
surements becomes fixed or standardized. 

Because standardization of a features yields a dimen¬ 
sionless respective feature, this operation is often applied 
in order to normalize the influence of unit choices on the 
respective classification. 

Figure [7] illustrates the standardization of two clus¬ 
ters composed of uniformly distributed points in a two- 
dimensional respective feature space (the same situation 
as in the previous example). The respective displacement 
field, shown in (b), is similar but with more intense magni¬ 
tudes, as the points undergo larger movements. Observe, 
however, that the magnitude of the displacements should 
often be considered in relative terms between the involved 
displacements (e.g. even larger magnitudes would be ob¬ 
tained if the clusters were further away from the coordi¬ 
nate origin, but the result would still be the same). As 
with the minmax normalization, the two clusters resulted 
with an elongated shape that can impact the subsequent 
processing. 


7 Principal Component Analysis 
(PCA) 

Given a set of N entities, each represented by a respective 
feature vector f p = [/i, p ; h, p \ ■ ■ ■; /m, p ] t {M features), 
with p — 1,2,..., TV, it is possible to apply a transforma¬ 
tion that completely decorrelates these features, allowing 
a possible dimensionality reduction, in the sense of yield¬ 
ing a new set of m features such that m < M while im¬ 
plying in little loss of variation. This can be achieved 
by using the statistical transformation typically known as 
principal component analysis (PCA, e.g. [ 6 ]). 

We start by deriving the covariance matrix K of the 
feature vectors, which corresponds to a random vector, 
which in many cases can be estimated as 


Kij = covariance(fi, fj) 


E P =i(/tp Vfj) 

(13) 


Once this covariance matrix is obtained, its eigenvalues 
and eigenvectors are obtained. The eigenvalues are then 
sorted in decreasing order, yielding Ai, A 2 ,..., Am, with 
respectively associated eigenvectors t?i, F 2 ,..., vm- The 
latter are stacked as lines of an M x M matrix Q, i.e.: 


Q 


<— vi 
<- V2 


_ ^ ^ _ 


(14) 


The new feature vectors, fi, can now be obtained by 
the PCA transformation as: 


/= Qfi (15) 

This corresponds to a linear transformation, implying 
that the new feature vectors are obtained as linear com¬ 
binations of the original ones. Furthermore, this trans¬ 
formation can be understood as a rotation of the original 
feature space so that the data variance is concentrated 
along the first new axes, corresponding to the principal 
new features. The variance of the new features is given 
by the eigenvalues associated to the respective axes. 

It can be shown (e.g. 0 ) that PCA yields a new set 
of features that are fully uncorrelated , implying in respec¬ 
tive redundancy reduction. As a consequence, the new 
covariance matrix will necessarily be diagonal. 

We can define the covariance explanation index 77 pro¬ 
vided by the first m new features as: 


V = 


E"iAi 


(16) 


If we set a value for 77 , such as 90% ,we can keep only 
the m features required for achieving at least that co- 
variance explanation index, thus achieving a reduction of 
dimensionality from M to m. This is possible because, by 
removing covariance, PCA also decreases the redundancy 
of the features. 

It is often interesting to standardize the features (see 
Section [ 6 | prior to PCA. 

Figure [ 8 ] illustrates the effect of PCA of an elongated set 
of points. Observe that the first new axis, x results aligned 
with the direction of largest variation in the original set 
of points. 


8 Concluding Remarks 

Pattern recognition involves several stages, including fea¬ 
tures transformation and normalization. In this work, we 
addressed the latter operations with the help of the con¬ 
cept of morphing a set of points and visualization of re¬ 
spective displacement fields. In addition to revising some 
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Figure 7: The standardization of twowell-separated clusters of uniformly distributed points (a). As with the minmax normalization, elongated 
clusters are obtained (c) as a consequence of the vertical expansion implied by the respective displacement field (b). 



X 

Figure 8 : Illustration of PC A action on an elongated set of uniformly 
distributed points. Obseve that the first new axis, x , aligns itself 
along the direction of maximum variation in the original data. The 
eigenvalues associated to the two new axes are also shown, which 
yields a variance explanation of 77 = 0.272/(0.272 + 0.011) = 96% 
when keeping only the first new axis. 


of the main transformations and normalizations, it has 
also been shown that these operations can substantially 
influence the subsequent stage of classification. Depend¬ 
ing on the type of data and transformation/normalization, 
we can both enhance and undermine cluster identification. 
Therefore, great care should be taken while transforming 
and normalizing features. 
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Costa’s Didactic Texts - CDTs 


CDTs intend to be a halfway point between a 
formal scientific article and a dissemination text 
in the sense that they: (i) explain and illustrate 
concepts in a more informal, graphical and acces¬ 
sible way than the typical scientific article; and 
(ii) provide more in-depth mathematical develop¬ 
ments than a more traditional dissemination work. 

It is hoped that CDTs can also incorporate new 
insights and analogies concerning the reported 
concepts and methods. We hope these character¬ 
istics will contribute to making CDTs interesting 
both to beginners as well as to more senior 
researchers. 

Each CDT focuses on a limited set of interrelated 
concepts. Though attempting to be relatively 
self-contained, CDTs also aim at being relatively 
short. Links to related material are provided in 
order to complement the covered subjects. 

Observe that CDTs, which come with absolutely 
no warranty, are non distributable and for non¬ 
commercial use only. 

The complete set of CDTs can be found 
at: https://www.researchgate.net/proj ect/ 
Costas-Didactic-Texts-CDTs. 





